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Abstract In the past decade, Social Tagging Systems have attracted increasing attention from both 
physical and computer science communities. Besides the underlying structure and dynamics of tag- 
ging systems, many efforts have been addressed to unify tagging information to reveal user behaviors 
and preferences, extract the latent semantic relations among items, make recommendations, and so on. 
Specifically, this article summarizes recent progress about tag-aware recommender systems, emphasiz- 
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1 Introduction 

The last few years have witnessed an ex- 
plosion of information that the exponential 
growth of the Internet [T] and World Wide Web 
[2] confronts us with an information overload: 
there are too much data and sources to be able 
to be found out those most relevant for us. In- 
deed, we have to make choices from thousands 
of movies, millions of books, billions of web 
pages, and so on. Evaluating all these alter- 
natives by ourselves is not feasible at all. As 



a consequence, an urgent problem is how to 
automatically find out the relevant items for 
us. Internet search engine [3], with the help 
of keyword-based queries, is an essential tool 
in getting what we want from the web. How- 
ever, the search engine does not take into ac- 
count personalization and returns the same re- 
sults for people with far different habits. In 
addition, not all needs or tastes can be eas- 
ily presented by keywords. Comparatively, rec- 
ommender system jl], which adopts knowledge 
discovery techniques to provide personalized 
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recommendations, is now considered to be the 
most promising way to efficiently filter out the 
overload information. Thus far, recommender 
systems have successfully found applications in 
e-commerce [5] , such as book recommendations 
in Amazon.com [6], movie recommendations 
in Netflix.com [7J, video recommendations in 
TiVo.com [8], and so on. 

A recommender system is able to auto- 
matically provide personalized recommenda- 
tions based on the historical record of users' 
activities. These activities are usually repre- 
sented by the connections in a user-item bipar- 
tite graph [9J [10]. So far, collaborative filter- 
ing (CF) is the most successful technique in 
the design of recommender systems [UJ , where 
a user will be recommended items that peo- 
ple with similar tastes and preferences liked 
in the past. Despite its success, the perfor- 
mance of CF is strongly limited by the spar- 
sity of data resulted from: (i) the huge num- 
ber of items is far beyond user's ability to 
evaluate even a small fraction of them; (ii) 
users do not incentively wish to rate the pur- 
chased/viewed items [12] . Besides the funda- 
mental user-item relations, some accessorial in- 
formation can be exploited to improve the al- 
gorithmic accuracy [13]. User profiles, usually 
including age, sex, nationality, job, etc., can be 
treated as prior known information to filter out 
possibly irrelevant recommendations [H], how- 
ever, the applications are mostly forbidden or 
strongly restricted to respect personal privacy. 
Attribute-aware method [15] takes into account 
item attributes, which are defined by domain 
experts. Yet it is limited to the attribute vo- 
cabulary, and, on the other hand, attributes 
describe global properties of items which are 
essentially not helpful to generate personalized 



recommendations. In addition, content-based 
algorithms can provide very accurate recom- 
mendations [T6j . however, they are only effec- 
tive if the items contain rich content informa- 
tion that can be automatically extracted out, 
for example, these methods are suitable for rec- 
ommending books, articles and bookmarks, but 
not for videos, tracks or pictures. 

Recently, the network theory provides us 
a powerful and versatile tool to recognize and 
analyze such relation-based complex systems 
where nodes represent individuals, and links 
denote the relations among them. Therefore, 
many social, biological and technological and 
information systems can be represented as com- 
plex networks. In addition, a vast amount of ef- 
forts has been addressed in understanding the 
structure, evolution and dynamics of complex 
networks [JH Q21 HH EH [21]. However, the 
advent of Web 2.0 and its affiliated applica- 
tions bring a new form of user-centric paradigm 
which can not be fully described by pre-existing 
models on neither unipartite or bipartite net- 
works. One such example is the user-driven 
emerging phenomenon, folksonomy [22], which 
not only allows users to upload resources (book- 
marks, photos, movies, publications, etc.) but 
also freely assign them with user-defined words, 
so-called tags. Folksonomy requires no specific 
skills for user to participate in, broadens the 
semantic relations among users and resources, 
and eventually achieves its immediate success 
in a few years. Presently, a large number of 
such applications can be found online, such as 
Del. icio.u s\ (with tags of bookmarks by users), 
MovieLens\ (with ratings of movies by users), 
CiteULikeli with tags of publications by users), 
BibSonomyi (with tags of bookmarks and refer- 
ences by users), F/zcA;rl (with tags of images by 
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users), Last.fm\ (with tags of music by users) 
etc. From the view of physics, all these on- 
line systems have performed similar statistical 
properties, e.g. Zipf's law like rank-frequency 
distribution [23] and Heaps' laws growth phe- 
nomenon [23] . between which the in-depth un- 
derstanding are studied in recent works [25], [26] . 
With the help of those platforms, users can 
not only store their own resources and manage 
them with social tags, but also look into other 
users' collections to find what they might be in- 
terested in by simply keeping track of the bas- 
kets with tags. Unlike traditional information 
management methods where words (or indices) 
are normally pre-defined by experts or adminis- 
trators, e.g. the library classification systems. 
A tagging system allows users to create arbi- 
trary tags that even do not exist in dictionar- 
ies. Therefore, those user-defined tags can re- 
flect user behaviors and preferences with which 
users can easily make acquaintance, collaborate 
and eventually form communities with others 
who have similar interests [27] . 

2 Overview of Tag-based Recommender 
Systems 

Nowadays, people are confronting huge 
amount of information and making much ef- 
fort in searching relevant or interesting items. 
However, as discussed in previous section, it 
is impossible for individuals to filter metadata 
from various structures and massive number of 
sources, especially in a user-generated informa- 
tion era [28J. The motivation of users' contri- 
bution is straightforward: they build their own 
data based on which they become further in- 
volved in web-based communications. Social 
tagging is becoming one of most popular tools 
in playing important rules among various so- 
cial activities. Ding et al. [29] provided good 
overviews of social tagging systems with em- 
phasis on both its social impact and ontology 
modeling. 



As a consequence, social tags can be nat- 
urally considered as kind of additional yet use- 
ful resource for designing effective recommen- 
dation algorithms. Firstly, tags are freely as- 
sociated by users, which can reflect their per- 
sonalized preferences. Secondly, tags express 
the semantic relations among items, which can 
help evaluating the underlying item qualities. 
Thirdly, the co-occurrence properties of tags 
can be employed to build both user commu- 
nities and item clusters, which be further made 
use of to find relevant yet interesting items for 
targeted individuals. Therefore, tags provide 
us a promising way to solve some stubborn 
problems in recommender systems, e.g. the 
cold-start problem [30J. 

Up to date, a remarkable amount of re- 
searches have discussed how to apply tags in 
the domain of recommender systems. Hotho 
et al. [31] proposed a modified PageRank [3] 
algorithm, namely FolkRank, to rank tags in 
folksonomies with the assumption that impor- 
tant tags are given by important users, which 
is akin to HITS [32] algorithm in internet net- 
works. The FolkRank is then be adopted to 
recommend tags [33]. In addition, due to the 
user-generated property, tags are considered to 
have high personalized information, hence can 
be used to design methods for both person- 
alized searching [31] and recommendation. A 
good overview of social bookmarking and its 
applications in recommender systems can be 
found in a recent Ph.D. thesis [35]. However, 
although tags are especially useful for both or- 
ganizing and searching resources, there are also 
many studies arguing that not all tags can ben- 
efit recommendation [36] because of the vari- 
ous limitations of tags, such as polysemy, syn- 
onymy, ambiguity [221 [37J EE1 [39] , etc. These 
flaws are also the side effects of the uncon- 
trolled vocabulary, thus it remains some open 
issues in tagging systems: (i) singularity vs. 
plurality: e.g. the words cat and cats some- 
how have very similar meanings, however, refer 
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to two different words in tagging systems; (ii) 
polysemy vs. synonymy: e.g. the word ap- 
ple may refer to a kind of fruit, while it can 
also indicate the well-known computer com- 
pany, Apple Inc., as well as its products; on 
the other hand, the words mac, macintosh, 
and apple all point to the products of Apple 
Inc., however, it fails again to uncover their 
underlying relations in tagging systems; (hi) 
different online tagging systems allow users to 
give different formats of tags, e.g. Delicto. us 
only allows words to be assigned, which sub- 
sequently results in compound words with var- 
ious symbols (e.g. underline, dashline, colon, 
etc.), leading to an unlimited formats of meta- 
data. Such freestyle tags additionally exem- 
plify the explosion of observed datasets, hence 
interfere in the analyses of the structure and 
user behaviors in tagging systems. Recently, 
researches have devoted much effort to solve 
those issues. Firstly, clustering-based methods 
[4T)| |4"T] are proposed to alleviate the word re- 
duction problem. Secondly, semantic methods 
are discussed to use ontology-based to organize 
tags and reveal the semantic relations among 
them (12} S3]. Thirdly, dimension reduction 
and topic-based methods are put forward to 
discover the latent topics [44j [45], and graph- 
based methods are proposed [HH H7] to solve 
the sparsity problem in large-scale datasets. 

In the following, we firstly give the eval- 
uation metrics measured in this survey. Sec- 
ondly we summarize some of the most recent 
and prominent tag-aware recommendation al- 
gorithms, showing and discussing how they 
make use of the aforementioned representations 
to address the some unresolved issues in rec- 
ommender systems. Basically, there are three 
kinds of recommendations in social tagging sys- 
tems: (i) predicting friends to users; (ii) recom- 
mending items to users; (iii) pushing interest- 
ing topics (tags) to users. However, as men- 
tioned above, the most urgent problem in in- 
formation era is to filter irrelevant items for in- 



dividuals, therefore, in this survey, we mainly 
discuss the second case, and introduce some re- 
lated methods discussing (i) or (iii) if neces- 
sary. Finally, we conclude with comparison of 
the surveyed methods and outline some future 
challenges of tag-aware recommendation algo- 
rithms. 

3 Tag-Aware Recommendation Models 

Formally, a social tagging network con- 
sists of three different kind of communities: 
users, items and tags, which subsequently form 
an entry set of personalized folksonomy, per- 
sonomy jH], each follows the form F={user, 
item, tagi, tag2, • • • , tag t }, where t is the num- 
ber of tags assigned to this item by the very 
user. Correspondingly, in a recommender sys- 
tem, a full folksonomy can be considered in two 
ways: (i) to be consisted of three sets, respec- 
tively of users U = {U\,U2,- ■ ■ ,U n }, items / = 
{Ii,h,- ■ ■ Jm}, and tags T = {Ti,T 2 ,- ■ ■ ,T r }. 
Consequently, each binary relation can be de- 
scribed by a adjacent matrix, A, A' and A" for 
user-item, item-tag and user-tag relations, re- 
spectively. If Ui has collected Ij, we set = 
1, otherwise = 0. Analogously, we set a'j k = 
1 if Ij has been assigned by the tag T k , and a'j k 
= otherwise. Furthermore, the users' prefer- 
ences on tags can be represented by a adjacent 
matrix A", where a" ik — 1 if Ui has adopted T k 
and a'[ k = otherwise; (ii) a ternary (SJ 09] or 
hypergraph [501 EH E2] based structure: only 
complete ternary relation is taken into account 
to be existed real link. That is to say, each 
relation of (u,i,t), represented as an existing 
component Y — 1, if it exists in a folksonomy 
F, and Y = otherwise. 

3.1 Evaluation Metrics 

For a traditional recommender system, 
each data set, E, is randomly divided into two 
parts to test the performance of proposed al- 
gorithms: the training set, E p , is treated as 
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known information, while the testing set, E T , 
is used for testing. In this survey, the training 
set always contains 90% of entries, and the re- 
maining 10% of entries, constitute the testing 
set. In addition, each division should guaran- 
tee E T f]E p = and E T {J E p = E in or- 
der to make sure no redundant information is 
used. Furthermore, To give solid and compre- 
hensive evaluation of the proposed algorithm, 
we consider metrics of both accuracy [53J and 
diversity [51] to characterize the performance 
of recommendations. 

3.1.1 Metrics of Accuracy 

1. Ranking Score (RS) \±0\. — In the present 
case, for each entry in the testing set (i.e. 
a user-item pair), RS is defined as the 
rank of the item, divided by the num- 
ber of all uncollected items for the cor- 
responding user. Apparently, the less the 
RS, the more accuracy the algorithm is. 
(RS) is given by averaging over all entries 
in the testing set. 

2. The area under the ROC curve [551156] .— 
In the present case, the area under the 
ROC curve, abbreviated by AUC, for a 
particular user is the probability that a 
randomly selected removed item for this 
user (i.e., an item in the testing set and 
being collected by this user) is given a 
higher score by our algorithm than a ran- 
domly selected uncollected item (i.e, an 
item irrelevant to this user in neither the 
training set nor the testing set). The 
AUC for the whole system is the average 
over all users. If all the scores are gen- 
erated from an independent and identi- 
cal distribution, AUC ~ 0.5. Therefore, 
the degree to which the AUC exceeds 0.5 
indicates how much better the algorithm 
performs than pure chance. 

3. Recall [II]-— Note that, the AUC takes 



into account the order of all uncollected 
items, however, in the real applications, 
user might only care about the recom- 
mended items, that is, the items with 
highest scores. Therefore, comple- 
mentary measure, recall is employed to 
quantify the accuracy of recommended 
items, which is defined as: 

1 n 

Recall =-Y Nl/Nl (1) 
n ^— ' v 

i=i 

where N l p is the number of items collected 
by Ui in the testing set, and iV* is the 
number of recovered items in the recom- 
mendations for Ui. We use the averaged 
recall instead of simply counting N r /N p 

with N r = J2i N r and N p = J2i N p since 
it is fair to give the same weight on every 
user in the algorithm evaluation. Assum- 
ing the length of recommendation list, L, 
is fixed for every user, recall is very sen- 
sitive to L and a larger L generally gives 
a higher recall. 

3.1.2 Metrics of Diversity 

1. Inter Diversity [Inter D) [571 ITU].— 
Inter D measures the differences of differ- 
ent users' recommendation lists, thus can 
be understood as the inter-user diversity. 
Denote P R the set of recommended items 
for user U iy then 

n(n-l)^\ L ) 

(2) 

where L = \P R \ is the length of recom- 
mendation list. In average, greater or less 
InterD mean respectively greater or less 
personalization of users' recommendation 
lists. 

2. Inner Diversity (Inner D) [57] . 
Inner D measures the differences of items 



within a user's recommendation list, thus 
can be considered as the inner-user diver- 
sity. It reads, 
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InnerD = 1 — Yl 3 *> 

(3) 

, : — r is the cosine sim- 

ilarity between items Ij and Ii, where 
IV denotes the set of users having col- 
lected object Ij. In average, greater or 
less InnerD suggests respectively greater 
or less topic diversification of users' rec- 
ommendation lists. 



3.2 Network-based Models 



Recently, there are a variety of attempts 
utilizing tagging information for recommenda- 
tion from a perspective of graph theory, Gener- 
ally, a tag-based network can be viewed as a tri- 
partite graph which consists of three integrated 
bipartite graphs [10] or a hypergraph. There- 
fore, network-based methods are widely used to 
describe the tag-based graph. Up to date, bi- 
partite graph has been largely applied to depict 
massive number of online applications. For ex- 
ample, users rate movies, customers comment 
books, individuals participate in online games, 
etc. In a typical bipartite graph, there are two 
mutually connected communities, which con- 
trastively have no link within each community, 
shown in Fig. 1. 




Fig. 1. (Color online) Illustration of a user- item 
bipartite network [58] composed by 6 users and 8 
items, in which only inter-community links are al- 
lowed. 

Inspired by this elegant structure, two un- 
derlying network-based methods: Probability 
Spreading (ProbS) [TUJ |5l] and Heat Spreading 
(HeatS) [591 131] . were proposed starting; 
point to apply network theory in recommender 
systems. 

ProbS is also known as random walk (RW) 
in computer science and mass diffusion (MD) in 
physics. Given a target user Ui, ProbS will gen- 
erate final score of each item, fj, for her/him 
according to following rules: 

Suppose that a kind of resource is initially 
located on items. Each item averagely dis- 
tributes its resource to all neighboring users, 
and then each user redistributes the received 
resource to all his/her collected items. The fi- 
nal resource vector for the target user Ui, f p , 
after the two-step mass diffusion is: 



f) 



n m 



\ rn 



J 



1,2, 



m, (4) 



where k(Ui) = Ylj=i a ij ^ s ^ ne number of col- 
lected items for user Ui, and k(I s ) = Yl^i* 1 ™ 
is the number of neighboring users for item I s . 

Comparatively, HeatS works based on the 
reverse rules of ProbS. At each step, each target 
will receive resources according to how active or 
popular it is, while ProbS distributes resources 
based on its own activity or popularity. Thus, 
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Fig. 2. (Color online) Illustration of a user-item-tag tripartite graph consists of 3 users, 5 items and 
4 tags, as well as the recommendation process described in [36]. The tripartite graph is decom- 
posed to user-item (black links) and item-tag (red links) bipartite graphs connected by items. For 
the target user U\, the scoring process works as: (a) firstly, highlight the items, I\, I3, 1$, collected 
by the target user U\ and mark them with unit resource. That is to say: fj 1 = fi 3 = fj 5 = 1, 
and f[ 2 = fi 4 = 0. (b) secondly, distribute the resources from items to their corresponding users 
and tags, respectively; e.g. fu 3 = f h * § + f h * 

fr< = fh * I + fl 3 * \ + fu * \ _ = 1 * \ + 1 * \ + = 
users and tags to their neighboring items, e.g. fj = 



I + f h * £ = 1 * i + + 1 * i = 1 and 
: |; (c) finally, redistribute the resources from 
fu 2 * h + fu 3 * I = \ * k + 1 * Z = ^ and 



12 



1 * 3 + 6 * 3 



11 
18' 



the final resource vector for the target user Ui 
f h , after the two-step heat spreading is: 



I 



Hi,: 



EE 



1=1 



O-ljO-lsO-is 



1,2, 



(5) 

Therefore, HeatS will depress the score of 
popular items and is inclined to recommend the 
relatively cold ones, while ProbS will enhance 
the scoring ability of popular items. 

Based on the aforementioned methods, a 
variety of algorithms have been proposed to 
add tags in order to generate better recom- 
mendation performance. Zhang et al. [16] 
firstly proposed a tag-aware diffusion-based 
method, considering tags as additional infor- 
mation, which extended the resulting paradigm 
as reduced bipartite graphs, known as tripar- 
tite graph. In such a graph, one kind of nodes 
(users, items or tags) plays as a centric role to 
bridge the remaining two. Fig. 2 shows an ex- 
ample of item-centric model. In such a graph, 



each item of a target user will respectively dis- 
tribute to its neighboring users and tags, and 
then all the items in database will receive their 
resources from their neighboring nodes. Hence, 
the final resource for the target user Ui, /*, af- 
ter two-step mass diffusion (see Fig. 2), will be 
integrated in a linear way: 



/J = Atf + (1-A)/ 



pi 



(6) 



is the re- 



where ff = ELi E^i 
source of item j received from item-tag bi- 
partite graph, k{Ti) = J2f=i a 'ji * s the num- 
ber of neighboring items for tag TJ, k'(I s ) = 
Y^i=i a 'si ls t ne number of neighboring tags for 
item I s , and A G [0, 1] is a tunable parame- 
ter to obtain the optimal performance. Ta- 
ble 1 shows the corresponding AUC results 
for three datasets: Del.icio.us, MovieLens and 
BibSonomy, in which the AUC values are sig- 
nificantly improved by considering item-tag bi- 
partite relation. In addition, [46J also experi- 
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mentally demonstrated that the incorporation 
of tags can enhance the Recall results for vari- 
ous ranges of recommendation length. Besides 
the accuracy, [36] extensively showed that tags 
could also promote the recommendation diver- 
sification, hence enlarge the selection vision for 
users. 

Recently, a variety of researchers have de- 
signed tag-aware algorithms by modifying the 
above model. Shang et al. [60] proposed a user- 
centric diffusion-based similarity, which consid- 
ered users as the communication hubs to mea- 
sure the coincidence among users, and found it 
could obtain more accurate recommendations. 
In addition, the tag usage frequency were mea- 
sured as edge weight in user-item bipartite net- 
works. Shang and Zhang [HI] directly regarded 
the frequency as weight and applied diffusion 
method to improve the recommendation accu- 
racy. Wu and Zhang [62] viewed the tag usage 
patten in a document vocabulary manner and 
applied the inverse document frequency (TF- 
IDF) model [63] to calculate the weight for 
user-item relations. They found this weighting 
method could enhance the recommendation di- 
versity. Furthermore, Zhang el al. [30J took 
such tag usage frequency into account on the 
user-tag and then spread the tag-based pref- 
erences to all the corresponding tags' neigh- 
boring items. The numeric results showed it 
could significantly enhance the algorithmic ac- 
curacy for relatively inactive or new users, and 
it also found that different tag usage patterns 
might result in different algorithmic diversity: 
the more diverse topic of tags users like, the 
more diverse results the algorithm would gen- 
erate. Consequently, two fundamental roles of 
tags [52| 161] , describing and retrieving items, 
were firstly found applications in recommender 
systems. Up to date, Liang et al. [65] have 
noticed that the above methods decomposed 
the user-item-tag relationships into two bipar- 
tite graphs and made recommendations, which, 
to some extents, ignored the remaining one bi- 



nary relation (e.g. user-tag for [35], user-item 
for [30]). As a result, by further eliminating 
the noise of tags, they used the semantic mean- 
ing of tags to represent topic preferences of 
users and combined it with item preferences of 
users to measure user-based similarity. Sub- 
sequently, the hybrid similarity was used in a 
standard collaborative filtering framework to 
obtain better Recall results in two datasets: 
Amazon, com and Cite ULike. com. Similar mea- 
surements of user-based and item-based simi- 
larities were also widely applied by various re- 
searches [661 ETJ • 



3.3 Tensor-based Models 

Recently, the tensor factorization (TF) 
[68] based method has attracted increasing at- 
tention to be applied in designing recommen- 
dation algorithms in social tagging systems 
[691 El EH ED I3H] • Generally, by using ten- 
sor, a ternary relation, A = {u,i,t}, can be 
represented as [70] 



/ l,if(u,i,t) C Y, 

(u,i,t) — \ 

I 0, otherwise. 



There are also other researches that define 
the missing values for empty triples in which 
the items have never been tagged, while the 
negative values are set for the triples in which 
the items are tagged in other tensors [39]. Fig. 
3 shows the illustration of the above two defi- 
nitions. 
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Table 1. Comparison of algorithmic accuracy, measured by the AUC. Pure U-I and Pure I-T denote the 
pure diffusions on user-item bipartite graphs and item-tag bipartite graphs, respectively corresponding 
to A=l and A=0. The optimal values of A as well as the corresponding optima of AUC are presented for 
comparison. 
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Fig. 3. Illustration of tensor-based tag assignment 
ternary relation. Left panel: a visible tag assign- 
ment, ai u ,i,t)i i s se t 1) and otherwise [70]. Right 
panel: o,( u ,i,t) ls se ^ negative as the triple of which 
the item is tagged in other existing triples rather 
than ai u ,i,t) ■ The missing values are given to other 
empty triples 



For the purpose of recommendation, Y can 
be represented by three low-rank feature met- 
rics, U, I, T and one core tensor, C, shown 
as 



Y = C xu U xi I xt T, (8) 

where the core tensor C and the feature ma- 
trices U , I and T are the parameters to be 
learned and xx is the tensor a;- mode dimen- 
sion multiplication factor between a tensor and 
a matrix [49]. In addition, the size of feature 
matrices are: 



(9) 



where ku, hi, kr are the latent dimensions of 
the low-rank approximations for users, items 



and tags, respectively. Then, recommendations 
can be generated as 



y(u,i,t) 



^2 C («>M) ' U ( U M ' ' *(*,*)> 



(10) 

where the tilde denotes the feature dimensions 
and the hat indicates the elements of the fea- 
ture matrices. Finally, the personalized rec- 
ommendations list of items or tags will be dis- 
played to the target user in a descending order. 

The tensor factorization is based on Sin- 
gular Value Decomposition (SVD) [72], with 
which the ternary relation can be reduced to 
low dimensions, hence easier to be proceeded 
for recommendation. [44] used it correspond- 
ing to a TF model optimized for square-loss 
where all not observed values are learned as 0s. 
In further, [70] developed a unified framework 
to model the three types of entities. Then, 
the three-order tensor dimension decomposi- 
tion was performed by combining Higher Order 
Singular Value Decomposition (HOSVD) [73] 
method and the Kernel-SVD [741(75] smoothing 
technique on two real-world datasets: Last.fm 
and BibSonomy. The results showed improve- 
ments in Recall and Precision. [49] proposed a 
better learning approach for TF models, which 
optimized the model parameters for the AUC 
values. The optimization of this model is re- 
lated to Bayesian personalized ranking (BPR) 
proposed in [75]. They both tried to optimize 
over pairs of ranking constraints, where the for- 
mer focused on AUC optimization, and the lat- 
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ter optimized for pair classification. [77] dis- 
cussed the relationship between them in details. 

3.4 Topic-based Models 

Generally, the core challenge of recom- 
mender systems is to estimate the likelihood 
between users and items. In the last two 
decades, many efforts have been devoted to 
build various models to measure such probabil- 
ities in information retrieval. Deerwester et al. 
[78] proposed Latent Semantic Analysis (LSA) 
to use a term-document matrix describing the 
occurrences of terms in documents. Normally, 
each element in the matrix is weighted by TF- 
IDF [63J revealing the importance of the very 
term in its corresponding documents. In ad- 
dition, Hofmann [79] introduced the Probabil- 
ity Latent Semantic Analysis (PLSA) to im- 
prove recommendation quality for various set- 
tings by assuming a latent lower dimensional 
topic model as origin of observed co-occurrence 
distributions. Comparing with the standard 
LSA, PLSA is based on a mixture decompo- 
sition derived from a latent topic model which 
would statistically result in a more principled 
approach having a solid foundation. Eq. 11 
gives a formula way of PLSA 

P(w,d) = J2P(z)P(d\z)P(w\z) 

(11) 

= P(d)J2P(z\d)P(w\z), 

z 

where word w and document d are both gen- 
erated from the latent topic z, which is cho- 
sen conditionally to the document according to 
P(z\d), and a word is then generated from that 
topic according to P{w\z). However, PLSA 
does not allocate the topic distribution for each 
document, which might potentially lose infor- 
mation of documents with multiple subjects. 
Therefore, recently, a more widely used model, 
Latent Dirichlet Allocation (LDA) [80], was 
proposed to overcome this issue by allowing 
multiple latent topics with a priori Dirichlet 



distribution, a conjugate prior of multinomial 
distribution, assigned to each single document. 
Besides, LDA assumes that the documents are 
represented as random mixtures over the latent 
topics, each of which is given by a distribution 
over words. For each document d in collection 
D, LDA works as (see Fig. 4): 



o 



o 

e 


<: 

z 


— \ 




w n 

M 



Fig. 4. (Color online) Illustration of genera- 
tive process for LDA model (from wikipedia.org), 
where a, (3, 9 are parameters to be learned, z is 
the latent topic variable, w is observed variable of 
words, and the direction of arrows indicates the 
process flow. 

(i) Choose 9i from Dir(a), where i runs 
over the document collection; (ii) For each 
word Wij in document di, choose a latent topic 
Zij ~ Multinomial (6 i) and then choose a word 
Wij ~ Multinomial{P Zij ). Finally, after learn- 
ing the parameters by Gibbs sampling [81] 
or expectation-maximization (EM) algorithm 
[82] . the probability of the document collection 
can be given as 

P(D\a,f3) = Ylfp(6i\a)* 

i 

n.J2P( Z ij\0i)P( W ij\ Z ij' P) dd i 

\ 1 z ij J 

(12) 

Recently, those topic-based models are ap- 
plied in social tagging systems for both tag and 
item recommendations. In [151 [83], they pro- 
posed a PLSA-based hybrid approach unifying 
user-item and item-tag co-occurrence to pro- 
vide better item recommendations. In these 
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two works, they measured the co-occurrence 
probabilities of user-item and item-tag by sum- 
ming over the latent topic variables, and then 
maximized the likelihood of fused scenarios. 

Comparatively, LDA is more widely used 
for tag recommendation. Xi et al. |84j em- 
ployed LDA for eliciting topics from the words 
in documents, as well as the co-occurrence tags, 
where words and tags form independent vo- 
cabulary spaces, and then recommended tags 
for target documents. Krestel et al. (HSJ 186] . 
on the other hand, used LDA to extract hid- 
den topics from the available tags of items and 
then recommended tags from these latent top- 
ics. Bundschus et al. [87] integrated both user 
information and tag information into LDA al- 
gorithm. Its generative process extracted user 
specific latent topics using a Topic- Tag Model 
adding tags and User- Topic- Tag Model adding 
the user layer. It assumed that users had a 
multinomial distribution over topics, hence, the 
users' interests could be modeled by each tag 
assignment. Finally, they used two-step latent 
topic realizations (user-item based and tag- 
based topics) to provide personalized tag rec- 
ommendations. In addition, Bundschus et al. 
[88] summarized different topic modeling ap- 
proaches with respect to their ability to model 
annotations. Different from applying Bayesian 
rule to decompose the joint probability of item- 
tag and user-tag co-occupance, Harvey et al. 
[89] introduced a similar LDA-based approach 
for tag recommendation by decomposing the 
joint probability of latent topics given the tag 
assignments. Furthermore, Li et al. [90] com- 
bined LDA and GN community detection algo- 
rithm [911 192] to observe the topic distributions 
of communities, as well as community evolving 
over time in social tagging systems. On this ba- 
sis, they found that users in the same commu- 
nity tended to be interested in similar topics, 
which would shed some lights on recommenda- 
tion for groups. 



4 Conclusions and Outlook 

In this survey, we summarized the progress 
of studies on tag-aware recommender systems 
(RS), emphasizing on the recent contributions 
by both statistical physicists and computer 
scientists in three aspects: (i) network-based 
methods; (ii) tensor-based methods; (iii) topic- 
based methods. Generally, there is no single 
method that can fully address all the prob- 
lems existing in RS. Network-based and tensor- 
based methods can overcome the sparsity of 
large-scale data, hence can be used for design- 
ing efficient algorithms. However, they only fo- 
cus on the network structure, while lack con- 
siderations of relations among tags. Compar- 
atively, topic-based methods can distinguish 
tags into different topics, hence can produce 
more meaningful and understandable recom- 
mendations. However, since most of topic- 
based methods use machine learning to iter- 
atively refine the results, they require high- 
efficient hardwares for computation, and thus 
consume more computation time. Similar 
problem lies in tensor-based methods for di- 
mension reduction process. Therefore, a unified 
model might be considered to fully make use of 
their advantages and provide a more promising 
method in tag-aware recommender systems. 

Nowadays, RS is not a new problem in in- 
formation science, the advent of new Web2.0 
paradigms bring versatile tools and information 
to help build better recommendation models 
by integrating traditional methods. Recently, 
the studies of complex networks would bene- 
fit tag-based algorithms, because the in-depth 
understanding of network structure, user be- 
haviors and network dynamics can be used to 
design advanced tag-aware recommendation al- 
gorithms (e.g., making use of the information 
about hypergraph [521 193] and tripartite graph 
[94"1 195] of social tagging networks to better pre- 
dict underlying interests). On the other hand, 
tag-based algorithms can also help the trend 
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detection [HH] over time. 

Although the studies of tag-aware recom- 
mender systems have achieved fruitful goals, 
there are still challeng well as some new 
directions remained to be solved (discovered) 
in future: (i) the complete hypergraph [511 [52] 
should be well considered to fully address the 
integrity of tagging networks without decom- 
posing any information and thus is a promis- 
ing way to provide recommendations with bet- 
ter performance; (ii) most of current related 
researches emphasize on recommending single 
type of nodes, however, predicting the joint 
node pairs (e.g. item-tag pair [97]) compara- 
tively lacks of study. The joint pair recommen- 
dation would provide more personalized prefer- 
ence, hence be a new application of tag-aware 
recommender systems; (iii) since the tags are 
freely assigned by users, which consequently re- 
sults in much noise of added tags. Tag cluster- 
ing [101 [981 E9] methods and anti-spam jlOO] 
technique would be both promising ways to re- 
duce the noise and help provide high-quality 
recommendations; (iv) the probability-based 
models are mainly used to provide tag rec- 
ommendations in most recent researches, while 
how to well use them to benefit item recommen- 
dations is still an open challenge. In addition, 
those models would also help to prevent rumor 
spreading |101[ 1102] and trend detection [103J; 
(v) the multi-layered network [104J consists of 
user social interactions, tag co-occurrence re- 
lations and user-item-tag ternary information 
can be considered to describe the hierarchical 
structure of social tagging systems, and thus 
the Social Network Analysis (SNA) [105J and 
social influence [1061 1107[ 1108] based techniques 
can be used to provide more substantial recom- 
mendations, and social predictions [1091 IHOj as 
well; (vi) most tagging platforms are dynamical 
systems and evolve over time [111[ 1112] . thus 
the study of human dynamics [1 13] in analyz- 
ing the temporal behaviors and interests can 
provide real-time recommendations [1141 IH5] . 
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